AITopics | speaker verification

Collaborating Authors

speaker verification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f0aa7e9e67515fa0c607c2959ccda6a0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 22:00:54 GMT

icc regularizer, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland > Saarbrücken (0.14)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
Asia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

9d276b0a087efdd2404f3295b26c24c1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 03:43:13 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)

Add feedback

ccf8111910291ba472b385e9c5f59099-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 05:31:38 GMT

behavioral stylometry, stylometry, vector, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Industry: Leisure & Entertainment > Games > Chess (0.97)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.95)

Add feedback

ccf8111910291ba472b385e9c5f59099-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 05:31:36 GMT

behavioral stylometry, stylometry, vector, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Industry: Leisure & Entertainment > Games > Chess (0.73)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Alpha Divergence Losses for Biometric Verification

Koutsianos, Dimitrios, Mosner, Ladislav, Panagakis, Yannis, Stafylakis, Themos

arXiv.org Artificial IntelligenceNov-25-2025

Performance in face and speaker verification is largely driven by margin-based softmax losses such as CosFace and ArcFace. Recently introduced $α$-divergence loss functions offer a compelling alternative, particularly due to their ability to induce sparse solutions (when $α>1$). However, integrating an angular margin-crucial for verification tasks-is not straightforward. We find that this integration can be achieved in at least two distinct ways: via the reference measure (prior probabilities) or via the logits (unnormalized log-likelihoods). In this paper, we explore both pathways, deriving two novel margin-based $α$-divergence losses: Q-Margin (margin in the reference measure) and A3M (margin in the logits). We identify and address a training instability in A3M-caused by sparsity-with a simple yet effective prototype re-initialization strategy. Our methods achieve significant performance gains on the challenging IJB-B and IJB-C face verification benchmarks. We demonstrate similarly strong performance in speaker verification on VoxCeleb. Crucially, our models significantly outperform strong baselines at low false acceptance rates (FAR). This capability is critical for practical high-security applications, such as banking authentication, when minimizing false authentications is paramount. Finally, the sparsity of $α$-divergence-based posteriors enables memory-efficient training, which is crucial for datasets with millions of identities.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2511.13621

Country: Europe > Greece (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.55)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.55)
(2 more...)

Add feedback

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Wei-Ning Hsu, Yu Zhang, James Glass

Neural Information Processing SystemsNov-21-2025, 04:22:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, utterance, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Mississippi (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Neural Information Processing SystemsNov-20-2025, 22:22:08 GMT

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.

multispeaker text-to-speech synthesis, speaker verification, transfer learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.39)

Add feedback

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Supervised Speech Foundational Model

Baali, Massa, Singh, Rita, Raj, Bhiksha

arXiv.org Artificial IntelligenceOct-21-2025

Self-supervised speech models have achieved remarkable success on content-driven tasks, yet they remain limited in capturing speaker-discriminative features critical for verification, diarization, and profiling applications. We introduce DELULU, a speaker-aware self-supervised foundational model that addresses this limitation by integrating external supervision into the pseudo-label generation process. DELULU leverages frame-level embeddings from ReDimNet, a state-of-the-art speaker verification model, to guide the k-means clustering step during pre-training, introducing a strong speaker-discriminative inductive bias that aligns representation learning with speaker identity. The model is trained using a dual objective that combines masked prediction and denoising, further enhancing robustness and generalization. DELULU significantly outperforms prior self-supervised learning (SSL) models across a range of speaker-centric tasks, achieving up to 62% relative improvement in equal error rate (EER) for speaker verification and consistent gains on zero-shot profiling tasks such as gender, age, accent, and speaker counting. Our findings demonstrate that DELULU is a strong universal encoder for speaker-aware speech processing, enabling superior performance even without task-specific fine-tuning.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2510.17662

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology: